9,923 research outputs found

    Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing

    Get PDF
    In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labelling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labelling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work introduces the technical aspects of the platform and explores its future use cases

    A cross-correlation-based estimate of the galaxy luminosity function

    Full text link
    We extend existing methods for using cross-correlations to derive redshift distributions for photometric galaxies, without using photometric redshifts. The model presented in this paper simultaneously yields highly accurate and unbiased redshift distributions and, for the first time, redshift-dependent luminosity functions, using only clustering information and the apparent magnitudes of the galaxies as input. In contrast to many existing techniques for recovering unbiased redshift distributions, the output of our method is not degenerate with the galaxy bias b(z), which is achieved by modelling the shape of the luminosity bias. We successfully apply our method to a mock galaxy survey and discuss improvements to be made before applying our model to real data.Comment: 14 pages, 7 figures. Replaced to match the version accepted by MNRA

    On the Minimum/Stopping Distance of Array Low-Density Parity-Check Codes

    Get PDF
    In this work, we study the minimum/stopping distance of array low-density parity-check (LDPC) codes. An array LDPC code is a quasi-cyclic LDPC code specified by two integers q and m, where q is an odd prime and m <= q. In the literature, the minimum/stopping distance of these codes (denoted by d(q,m) and h(q,m), respectively) has been thoroughly studied for m <= 5. Both exact results, for small values of q and m, and general (i.e., independent of q) bounds have been established. For m=6, the best known minimum distance upper bound, derived by Mittelholzer (IEEE Int. Symp. Inf. Theory, Jun./Jul. 2002), is d(q,6) <= 32. In this work, we derive an improved upper bound of d(q,6) <= 20 and a new upper bound d(q,7) <= 24 by using the concept of a template support matrix of a codeword/stopping set. The bounds are tight with high probability in the sense that we have not been able to find codewords of strictly lower weight for several values of q using a minimum distance probabilistic algorithm. Finally, we provide new specific minimum/stopping distance results for m <= 7 and low-to-moderate values of q <= 79.Comment: To appear in IEEE Trans. Inf. Theory. The material in this paper was presented in part at the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, June/July 201
    • …
    corecore